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ABSTRACT 

A method of classifying multisource data 
in remote sensing is presented. The pro- 
posed method considers each data source as 
an information source providing a body of 
evidence, represents statistical evidence by 
interval-valued probabilities, and uses 
Dempster's rule to integrate information 
based on multiple data sources. 

The method is applied to the problems of 
ground-cover classification of multispectral 
data combined with digital terrain data such 
as elevation, slope, and aspect. Then this 
method is applied to simulated 201-band 
High Resolution Imaging Spectrometer 
(HIRIS) data by dividing the dimensionally 
huge data source into smaller and more 
manageable pieces based on the global sta- 
tistical correlation information. It produces 
higher classification accuracy than the 
Maximum Likelihood (ML) classification 
method when the Hughes phenomenon is 
apparent. 


1 INTRODUCTION 

The importance of utilizing multisource 
data in ground-cover classification lies in 
the fact that it is generally correct to as- 
sume that improvements in terms of classi- 
fication accuracy can be achieved at the ex- 
pense of additional independent features 
provided by separate sensors. However, it 
should be recognized that information and 
knowledge from most available data sources 
in the real world are neither certain nor 
complete. We refer to such a body of uncer- 
tain, incomplete, and sometimes inconsis- 


tent information as “evidential informa- 
tion.” 

The objective of the current research is 
to develop a mathematical framework 
within which various applications can be 
made with multisource data in remote 
sensing and geographic information sys- 
tems. The methodology described in this 
paper has evolved from “evidential reason- 
ing,” where each data source is considered 
as providing a body of evidence with a cer- 
tain degree of belief. The degrees of belief 
based on the body of evidence are repre- 
sented by “interval-valued (IV) probabili- 
ties” rather than by conventional point- 
valued probabilities so that uncertainty can 
be embedded in the measures. 

There are three fundamental problems 
in the multisource data analysis based on IV 
probabilities: (1) how to represent bodies of 
evidence by IV probabilities, (2) how to 
combine IV probabilities to give an overall 
assessment of the combined body of evi- 
dence, and (3) how to make decisions based 
on IV probabilities. 

The paper describes a formal method of 
representing statistical evidence by IV 
probabilities based on the Likelihood 

Principle. In order to integrate informa- 
tion obtained from individual data sources, 
the method presented in the paper uses 
Dempster’s rule for combining multiple 
bodies of evidence. Although IV probabili- 
ties together with Dempster's rule provide 
an innovative means for the representation 

and combination of evidential information, 
they make the decision process rather 
complicated. We need more intelligent 

strategies for making decisions. This paper 

also focuses on the development of decision 
rules over IV probabilities. 
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2 AXIOMATIC DEFINITION OF 
IV PROBABILITY 

Interval-valued probabilities can be 
thought as a generalization of ordinary 
point-valued probabilities. The endpoints of 
IV probabilities are called the “upper prob- 
ability” and the “lower probability.” 

There have been various works intro- 
ducing the concepts of IV probabilities in 
the areas of philosophy of science and 
statistics [1][2][3] [4] . Although the mathe- 
matical rationales behind those approaches 
are different, there are some properties of 
IV probabilities which are commonly re- 
quired. The axiomatic approach to IV prob- 
abilities is based on those common proper- 
ties, so that it can avoid conceptual ambi- 
guities. 

DEFINITION [5] Suppose 0 is a finite set of 
exhaustive and mutually exclusive events. 
Let p denote a Boolean algebra of the subsets 
of 0. The IV probability [£, 11] is defined by 
the set-theoretic functions: 



£ : P -» [0, 1] 

(2.1) 


«:P->[0, 1] 

(2.2) 

satisfying the following properties: 


I) 

11(A) > L(A) > 0 foranyAeP 

(2.3) 

II) 

11(0) = £(0) = 1 

(2.4) 

III) 

V(A) + £(A) = 1 for any Ae p 

(2.5) 

IV) 

For any A, Be P and AnB=0, 
£(A)+£(B) < £(AuB) < £(A)+tl(B) 



< ti(AuB) < -U(A)+W(B) 

(2.6) 


Given a system of IV probabilities over p, 
the actual probability measure, P(A), of any 
subset A of 0 is assumed to lie in the interval 
[£, 11] such that 

£(A) < P(A) < 11(A) (2.7) 

The degree of uncertainty about the actual 
probability of A is represented by the width, 
U(A)-L(A), of the interval. In particular, 
?t(A) = £(A) = P(A) when there is complete 
knowledge of the probability of A . In this 
case, the IV probability becomes an ordi- 
nary additive probability. And £(A) + £(A)=0 


when there is absolutely no knowledge of 
the probability of A. 

The basic probability assignment m de- 
fined in Shafer’s mathematical theory of 
evidence[6] has the following relations with 
the IV probabilities: 

£(A) = ]Tm(B) (2.8) 

BGA 

m(A) = X (-1) ,A-B| £( B ) for all A <=0 (2.9) 
B£A 

11(A) = £ m(B) (2.10) 

BnA*0 


3 REPRESENTATION OF STATISTICAL 
EVIDENCE BY IV PROBABILITY 

When a body of evidence is based on the 
outcomes of statistical experiments known 
to be governed by any probability model, it 
is called “statistical evidence.” One of the 
basic problems for any theory of IV prob- 
abilities is how to represent a given body of 
statistical evidence by IV probabilities. 

DEFINITION [6] An upper probability 
function U is said to be “consonant” if its 
focal elements are nested, i.e., if for AjC0 
(i = l,...,r) such that m( Aj) > 0 for all i and 
r 

^m(A i )=l, A jciAj for any i < j, where m is the 
i = 1 

basic probability assignment of 11. 

Suppose the observed data in a statistical 
experiment are governed by a probability 
model {pe :9e0}, where pg is a conditional 

probability density function on a sample 
space X given 0. Our intuitive feeling is that 
an observation xe X seems to more likely 
belong to those elements of 0 which assign 
the greater chance to x. 

Based on the above intuition along with 
the consonance assumption of the upper 
probability function, Shafer[6] proposed the 
linear plausibility function defined as: 
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Pe-00 

ti(Alx) = 7— forallAeB and A*0 (3.1) 

rn^e p 0 (x) 

The corresponding lower probability func- 
tion is given as: 

ma_x p 0 .(x) 

£(Alx) = 1 — — — forallAeB (3.2) 

ngaxpe(x) 

In particular, when the set A is singleton, 
say {0'}, the function in Eq.(3.1) gives the 
relative likelihood of 0' to the most likely 
element in 6. 

4 DEMPSTER'S RULE FOR 
COMBINING IV PROBABILITIES 

Dempster’s rule is a generalized scheme 
of Bayesian inference to aggregate bodies of 
evidence provided by multiple information 
sources. Let mj and m 2 be the basic prob- 
ability assignments associated respec-tively 
with the belief functions and ®e^2 

which are inferred from two entirely dis- 
tinct bodies of evidence Ej and E2. For all Aj, 
Bj, andX k c0, Dempster’s rule (or Dempster’s 

orthogonal sum) gives a new belief func- 
tion denoted by 

•Be[=‘BeC l ®'BeC 1 (4.1) 

The basic probability assignment associated 
with the new belief function is defined as: 

m ( x l c )=( 1 -^)' 1 X m 1 ( A i )‘ m 2 ^ B 

A i nB j =x k 

for any X k * 0 (4.2) 

where £ is the measure of conflict between 
(BeC\ and (Bel \ defined as: 

£= £ m 1 (A i ) m 2 (B j ) (4.3) 

AjnBj=0 

Dempster’s rule computes the basic 
probability of X k , m(X k ), from the product 

of n»j(Aj) and m 2 (Bj) by considering all Aj 
and Bj whose intersection is X k . Once m is 
computed for every X k c0, the belief func- 


tion is obtained by the sum of m’s committed 
to X k and its subsets. The denominator (l-£) 

normalizes the result to compensate for the 
measure committed to the empty set so that 
the total probability mass has measure one. 
Consequently, Dempster’s rule discards the 
conflict between Ej and E2 and carries their 
consensus to the new belief function. 

Dempster’s rule is both commutative and 
associative. Therefore, the order or group- 
ing of evidence in combination does not 
affect the result, and a sequence of infor- 
mation sources can be combined either 
sequentially or pairwise. 

5 DECISION RULES FOR 
IV PROBABILITIES [7] 

Consider a classification problem where 
an arbitrary pattern xe X from an unknown 
class is assigned to one of n classes in 0. Let 
A,(0 jl0j) be a measure of the “loss” incurred 
when the decision 0j is made and the true 
pattern class is in fact 0j, where i, j = 1, ..., n. 

Also, let $(x) denote a decision rule that tells 
which class to choose for every pattern x. 
We define the “upper expected loss” and the 
“lower expected loss” of making a decision 
§(x)=0j as: 

n 

/•(x) = £ MOjiGj) a x (0j) (5.1) 

j=i 

n 

4j(x) = 2 Wi'Gj) £x<9j) (5.2) 

j=l 

where and L x are respectively the upper 
and the lower probabilities for x being 
actually from 0j. 

The “Bayes-like rule” is the one which 
minimizes both the upper and the lower 
expected losses, i.e., 

§(x)=0j if (*j(x)</j(x) and /* i (x)</*j(x) 

for j=l n (5.3) 

A problem with the above decision rule is 
that there does not always exist 0 which sat- 
isfies the condition in Eq.(5.3), which can 
lead to ambiguity. In such an ambiguous 
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situation, one may withhold the decision 
and wait for a new piece of information. 
Otherwise, the ambiguity may be resolved 
by resorting to the following rule, so-called 
“minimum average expected loss rule”: 

(*j(x)+4j(x) **:(x) + £*j(x) 

§00=0* if J j 2 — 

for j=l,..., n (5.4) 

As an alternative to the Bayes-like rule, 
there are two other rules by which a deci- 

sion is made according to individual mea- 
sures of the interval, that is, either the up- 

per expected loss or the lower expected loss: 

(A) minimum upper expected loss rule: 
0(x)=ej if fj(x) < (*(x) for j= 1 , n (5.5) 

(B) minimum lower expected loss rule: 

§(x)=0i if f*j(x) < (*j(x) for j=l,..., n (5.6) 

Although the above two rules always pro- 
duce decisions and there is no ambiguous 

situation in making a decision according to 
the rules, they do not utilize all of the in- 

formation represented by the IV probabili- 
ties. The performance of these rules are 
compared with the minimum average ex- 
pected loss rule in the experiments by 
applying them to problems of ground-cover 
classification based on remotely sensed and 
geographic data. 


6 EXPERIMENTAL RESULTS 

The experiments have been performed 
over two different image data sets. In the 
experiments, the classification accuracies of 
the multisource data (MSD) classification 
based on the proposed method were com- 
pared with those of Maximum Likelihood 
(ML) classifications based on the stacked 
vector approach. 

Table 1 describes the set of data sources 
for the first experiment. The image in this 
data set consists of 256 lines by 256 columns 
and covers a forestry site around the 
Anderson River area of British Columbia, 
Canada. Source 1 is 11 -band Airborne 
Multispectral Scanner (A/B MSS) data. 
Sources 2 and 3 are Synthetic Aperture 


Radar (SAR) imagery in Shallow mode and 
Steep mode, respectively. Sources 4 through 
6 provide digital terrain data. 

In this experiment, 6 classes were de- 
fined as listed in Table 2, and 100 pixels per 
class were used for training data, which is 
between 4% and 8% of the total pixels of the 
classes in the test fields. The training sam- 
ples are uniformly distributed over the test 


Table 1. Anderson River Data Set. 


Source 

Data 

Spectral 

Input 

Spectral 

Index 

Type 

Region 

Channel 

Band(pm) 




1 

.38 - .42 




2 

.42 - .45 




3 

.45 - .50 



Visible 

4 

.50 - .55 

| 



5 

.55 - .60 

B 1 

A/B 


6 

.60 - .65 


MSS 


7 

.65 - .69 




8 

.70 - .79 



Near IR 

9 

.80 - .89 




1 0 

.92 - 1.10 



Thermal 

1 1 

8 - 14 





XHV j 

2 

SAR 

Shallow 


XHH 





LHV 





LHH 





XHV 

3 

SAR 

Steep 


XHH 





LHV 





LHH 

4 

Topo- 

Elevation 



5 

graphic 

Aspect 



6 


Slope 




Table 2. Information Classes for Test of 
Anderson River Data Set. 


Class 

Index 

Cover 

Types 

Tree 

Sizes 

No. of 
Pixels 

% of 
Total 

1 

Douglas Fir 2 (df2) 

31 - 40m 

2246 

21.72 

2 

Douglas Fir 3 (df3) 

21 - 30 m 

1501 

14.52 

3 

DF+Other Species 2 
(df+os2) 

31 - 40m 

1352 

13.08 

4 

DF+Lodgepole Pine 2 
(df+lp2) 

21 - 30m 

1589 

15.37 

\l 

Hemlock+Cedar (he) 
Forest Clearings (fc) 

31 - 40m 

1587 

2064 

15.35 

19.96 

iTotal 



10339 

100.0 
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fields so that they may be considered as good 
representatives of the total samples. 

We have observed that some of the 

classes defined in Table 2 cannot be assumed 
to be normally distributed in the topo- 
graphic data. Thus it was decided to adopt a 
nonparametric approach such as the 
“Nearest Neighbor” (NN) method [8] in 
computing probability measures while the 
optical and radar data sources were assumed 
to have Gaussian probability density func- 
tions. 

First, the ML classification based on the 
stacked vector approach was carried out for 

various sets of the data sources, adding one 

source at a time to the A/B MSS data in the 
order Elevation, SAR-Shallow, SAR-Steep, 
Aspect, and Slope. Then the MSD classifica- 
tion based on the proposed method was per- 
formed using different decision rules. 
Tables 3 and 4 compare the results for the 
training samples and the test samples, re- 
spectively. Even though the compounded 
data in the ML classification were treated as 
having Gaussian distributions, the ML and 
the MSD methods produced similar results 
for the training samples. This is not sur- 

prising because the ML method uses con- 
ventional additive probabilities assuming 
that the knowledge concerning the actual 
unknown probabilities is complete, which is 
reasonable as far as the training samples 
are concerned. 


Table 3. Results of Classifications over 
Training Samples of Anderson River Data. 



Decision 

Rule 

Sources 

1 

1, 4 

1, 2, 4 

1 - 4 

1 - 5 

1 - 6 

ML 


82.50 

88.67 

91.67 

92.00 

92.83 

93.50 

MSD 

MUEL 

- 

89.83 

92.00 

92.50 

93.17 

94.33 

MLEL 

- 

88.67 

91.17 

91.33 

92.33 

93.67 

MAEL 

- 

88.50 

91.00 

91.67 

91.67 

93.50 


Comparing the performance of the three 
decision rules, the minimum upper expected 
loss (MUEL) rule was superior to the other 
rules, the minimum lower expected loss 
(MLEL) rule and the minimum average ex- 
pected loss (MAEL) rule. It is not known in 


general which rule is the best. Further 
research is needed to determine whether 
guidelines can be devised for selection of 
the decision rule. 


Table 4. Results of Classifications over Test 
Samples of Anderson River Data. 



Decision 

Rule 

Sources 

1 

L 4 

1, 2, 4 

1 - 4 

1 - 5 

1 - 6 

ML 


74.16 

77.77 

79.13 

78.93 

79.80 

81.01 

MSD 

MUEL 

- 

80.60 

82.39 

82.69 

83.02 

84.54 

MLEL 

* 

78.45 

81.42 

81.67 

82.24 

83.65 

MAEL 

- 

78.21 

80.95 

82.05 

81.88 

83.16 


In the second experiment, the proposed 
method was applied to the classification of 
HIRIS data by decomposing the data into 
smaller pieces, i.e., subsets of spectral 
bands. The data set used in this experiment 
is simulated HIRIS data obtained by RSSIM 
[9], RSSIM is a simulation tool for the study 
of multispectral remotely sensed images and 
associated system parameters. It creates 
realistic multispectral images based on de- 
tailed models of the ground surface, the at- 
mosphere, and the sensor. Table 5 provides 
a description of the simulated HIRIS data set. 

Figure 1 is a visual representation of the 
global statistical correlation coefficient 
matrix of the data. The image is produced by 
converting the absolute values of coeffi- 
cients to gray values between 0 and 255. 
Based on the correlation image, the 201 
bands were divided into 3 groups in such a 
way that intra-correlation is maximized and 
inter-correlation is minimized. Table 6 de- 
scribes the multisource data set after divi- 
sion. Note that the spectral regions of the 
input channels in Source 3 coincide with 
the water absorption bands. 

With 225 training samples (a third of the 
total samples) for each class, the ML classi- 
fication and the MSD classification using 
the minimum upper expected loss rule were 
performed over the total samples for vari- 
ous sets of the sources, and the results are 
listed in Table 7. 

The results of the ML method apparently 
show effects of the Hughes phenomenon; 
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the accuracy goes down as the dimensional- 
ity of the source increases while the num- 
ber of training samples is fixed. In particu- 
lar, the accuracy decreases by a consider- 
able amount when all features are used. 
Presence of the Hughes phenomenon causes 
the ML method to be particularly sensitive 
to a bad source. Source 3 in this case. 
Meanwhile, the proposed MSD classification 
method always shows robust performance 
and gives consistent results. 


Table 5. 

Description of Simulated 
HIRIS Data Set. 

|| Name 

Finney County Data Set 

Data Type 

201 -band HIRIS data simulated by 
RSSIM 

| Spectral Region 

0.4 - 2.4m m 

Spectral 

Resolution 

0.01m m 

Image Size 

45 lines x 45 columns (2025 
samples) 

Information 

Classes 

Winter Wheat, Summer Fallow, 
Unknown 


Table 6. Divided Sources of HIRIS Data Set. 


Source 

Index 

Input 

Channels 

No. of 
Features 

Source 1 

1- 35. 107 - 141, 157 - 201 

115 

Source 2 

36 - 95 

60 

Source 3 

96 - 106 (1.35 - 1.45pm) 
142 - 156 (1.81 - 1.95pm) 

26 


Table 7. Results of Classifications over Test 
Samples of Simulated HIRIS Data Set. 



Sources 1 


SI 

S2 

S3 

SI, S2 

All 

ML 

75.75 

75.60 

45.83 

74.56 

65.14 

MSD 


- 

- 

77.83 

77.63 


6 CONCLUSIONS 

In this paper we have investigated how 
interval-valued probabilities can be used to 
represent and aggregate evidential infor- 
mation obtained from various data sources. 
Overall concepts of interval-valued proba- 
bilities have been employed to develop a 
new method of classifying multisource data 
in remote sensing and geographic infor- 
mation systems. The experiments demon- 
strate the ability of our method to capture 
uncertain information based on inexact and 
incomplete multiple bodies of evidence. The 
basic strategy of this method is to decompose 
the relatively large size of evidence into 
smaller, more manageable pieces, to assess 
plausibilities and supports based on each 
piece, and to combine the assessments by 
Dempster's rule. In this scheme, we are able 
to overcome the difficulty of precisely 
estimating statistical parameters, and to 
integrate statistical information as much as 
possible. 
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